HHpred / HHsearch

HHsearch
Developer(s) Johannes Söding
Stable release 1.5.1 / 23 October 2008; 3 years ago (2008-10-23)
Written in C++
Available in English
Type Bioinformatics tool
License Creative Commons Attribution-NonCommercial-2.0
Website ftp://toolkit.lmb.uni-muenchen.de/HHsearch/

HHsearch is a program for protein sequence searching that is free for non-commercial use.[1] HHpred is a free protein function and protein structure prediction server based on the HHsearch method.[2] HHpred/HHsearch are among the most popular methods for protein structure prediction and the detection of remotely related sequences, having been cited over 700 times.[3]

Contents

Description

Sequence searches are frequently performed by biologists to infer the function of an unknown protein from its sequence. For this purpose, the protein's sequence is compared to the sequences of other proteins in public databases and its function is deduced from those of the most similar sequences. Often, no sequences with annotated functions can be found in such a search. In this case, more sensitive methods are required to identify more remotely related proteins or protein families. From these relationships, hypotheses about the protein's functions, structure, and domain composition can be inferred. HHsearch performs searches with a protein sequence through databases. The HHpred server and the HHsearch software package offer many popular, regularly updated databases, such as the Protein Data Bank, as well as the InterPro, Pfam, COG, and SCOP databases.

HHpred/HHsearch belongs to the class of profile-profile comparison tools, which includes the most sensitive sequence search methods to date.[1][4][5][6] They represent both the query sequence and the database sequences by sequence profiles, also called position-specific scoring matrices (PSSMs). Profiles are calculated from a multiple sequence alignment of related sequences which are typically collected using the PSI-BLAST program from the National Center for Biotechnology Information (NCBI). A profile is a matrix containing for each position in the query sequence the similarity score for the 20 amino acids. These scores are calculated from the frequencies of the amino acids at the corresponding positions in the multiple sequence alignment. Because profiles contain much more information than a single sequence (e.g. the position-specific degree of conservation), profile-profile comparison methods are much more powerful than sequence-sequence comparison methods like BLAST or profile-sequence comparison methods like PSI-BLAST.[4]

HHpred/HHsearch represents query and database proteins by profile hidden Markov models (HMMs), an extension of sequence profiles which also record position-specific amino acid insertion and deletion frequencies. HHsearch searches a database of HMMs with a query HMM. Before starting the search through the actual database of HMMs, HHsearch/HHpred builds a multiple sequence alignment of related sequences using a context-specific version of PSI-BLAST, called CSI-BLAST. From this alignment, a profile HMM is calculated. The databases contain HMMs that are precalculated in the same fashion using PSI-BLAST. The output of HHpred and HHsearch is a ranked list of database matches (including E-values and probabilities for a true relationship) and the pairwise query-database sequence alignments. A search through the PDB database of proteins with solved 3D structure takes a few minutes. If a significant match with a protein of known structure (a "template") is found in the PDB database, HHpred allows to build a homology model using MODELLER software, starting from the pairwise query-template alignment.

Applications

Applications of HHpred/HHsearch include protein structure prediction, function prediction, domain prediction, domain boundary prediction, and evolutionary classification of proteins.

CASP Rankings

HHpred servers have been ranked among the best servers during the last three CASP blind protein structure prediction experiments. In the last CASP, CASP9, HHpredA, B, and C were ranked 1st, 2nd, and 3rd out of 81 participating automatic structure prediction servers in template-based modeling[7] and 6th, 7th, 8th on all 147 targets, while being much faster than the best 20 servers.[8] In CASP8, HHpred was ranked 7th on all targets and 2nd on the subset of single domain proteins, while still being more than 50 times faster than the top-ranked servers.[9]

See also

References

  1. ^ a b Söding J (2005). "Protein homology detection by HMM-HMM comparison". Bioinformatics 21 (7): 951–960. doi:10.1093/bioinformatics/bti125. PMID 15531603. 
  2. ^ Söding J, Biegert A, Lupas AN. (2005). "The HHpred interactive server for protein homology detection and structure prediction". Nucleic Acids Research 33 ((Web Server issue)): W244–248. doi:10.1093/nar/gki408. PMC 1160169. PMID 15980461. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=1160169. 
  3. ^ Number of results returned from a search on Google Scholar. (Google Scholar search)
  4. ^ a b Jaroszewski L, Rychlewski L, Godzik A. (2000). "Improving the quality of twilight-zone alignments.". Protein Science 9 (8): 1487–1496. doi:10.1110/ps.9.8.1487. PMC 2144727. PMID 10975570. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2144727. 
  5. ^ Sadreyev RI, Baker D, Grishin NV (2003). "Profile-profile comparisons by COMPASS predict intricate homologies between protein families". Protein Science 12 (10): 2262–2272. doi:10.1110/ps.03197403. PMC 2366929. PMID 14500884. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2366929. 
  6. ^ Dunbrack RL Jr. (2006). "Sequence comparison and protein structure prediction.". Current Opinion in Structural Biology 16 (3): 374–384. doi:10.1016/j.sbi.2006.05.006. PMID 16713709. 
  7. ^ Official CASP9 results for the template-based modeling category (121 targets)
  8. ^ Official CASP9 results for all 147 targets
  9. ^ Hildebrand A, Remmert M, Biegert A, Soding J (2009). "Fast and accurate automatic structure prediction with HHpred". Proteins 77 Suppl 9: 128–32. doi:10.1002/prot.22499. PMID 19626712. 

External links